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Abstract 



In quantum physics the direct observables are probabilities of events. We ask, how ob- 
/■"■n ■ served probabilities must be combined to achieve what we call maximum predictive power. 

According to this concept the accuracy of a prediction must only depend on the number of 
runs whose data serve as input for the prediction. We transform each probability to an asso- 
ciated variable whose uncertainty interval depends only on the amount of data and strictly 
decreases with it. We find that for a probability which is a function of two other probabilities 
*y-s \ maximum predictive power is achieved when linearly summing their associated variables and 

transforming back to a probability. This recovers the quantum mechanical superposition 
principle. 



ON 
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^ . 1 Introduction 

9*'- 

Quantum theory is not yet understood as well as e.g. classical mechanics or special relativity. 
Classical mechanics coincides well with our intuition and so is rarely questioned. Special relativity 
runs counter to our immediate insight, but can easily be derived by assuming constancy of the 
speed of light for every observer. And that assumption may be made plausible by epistemological 
arguments [l[|. Quantum theory on the other hand demands two premises. First, it wants us to 
give up determinism for the sake of a probabilistic view. In fact, this seems unavoidable in a 
fundamental theory of prediction, because any communicable observation can be decomposed into 
a finite number of bits. So predictions therefrom always have limited accuracy, and probability 
enters naturally. More disturbing is the second premise: Quantum theory wants us to give up 
the sum rule of probabilities by requiring interference instead. However, the sum rule is deeply 
ingrained in our thought, because of its roots in counting and the definition of sets: Define sets 
with no common elements, then define the set which joins them all. The number of elements in 
this latter set is just the sum of the elements of the individual sets. When deriving the notion of 
probability from the relative frequency of events we are thus immediately led to the sum rule, such 
that any other rule appears inconceivable. And this may be the reason why we have difficulties 
accepting the quantum theoretical rule, where probabilities are summed by calculating the square 
of the sum of the complex square roots of the probabilities. In this situation two views are 
possible. We may either consider the quantum theoretical rule as a peculiarity of nature. Or, 
we may conjecture that the quantum theoretical rule has something to do with how we organize 



^This is a slightly revised version of a paper which appeared in Int. J. Theor. Physics 33, 171-178 (1994)- 



data from observations into quantities that are physically meaningful to us. We want to adopt the 
latter position. Therefore we seek to establish a grasp of the quantum theoretical rule with the 
general idea in mind that, given the probabilistic paradigm, there may exist an optimal strategy 
of prediction, quite independent of traditional physical concepts, but resting on what one can 
deduce from a given amount of information. We will formulate elements of such a strategy with 
the aim of achieving maximum predictive power. 



2 Representing Knowledge from Probabilistic Data 

Any investigative endeavour rests upon one natural assumption: More data from observations 
will lead to better knowledge of the situation at hand. Let us see whether this holds in quantum 
experiments. The data are relative frequencies of events. From these we deduce probabilities 
from which in turn we derive the magnitudes of physical quantities. As an example take an 
experiment with two detectors, where a click is registered in either the one or the other. (We 
exclude simultaneous clicks for the moment.) Here, only one probability is measurable, e.g. the 
probablity p% of a click in detector 1. After N runs we have n\ counts in detector 1 and n 2 counts 
in detector 2, with ri\ + n 2 = N. The probability pi can thus be estimated as 
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From px the physical quantity x{Pi) is derived. Its uncertainty interval is 
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The accuracy of x is given by the inverse of A%. With the above assumption we expect it to 
increase with each additional run, because we get additional data. Therefore, for any N, we 
expect 

A X (iV + l)<A X (iV). (4) 

However, this inequality cannot be true for an arbitrary function x{Pi)- in general A% will 
fluctuate and only decrease on the average with increasing N. To see this take a theory A which 
relates physical quantity and probability by xa = Pi- In an experiment of N = 100 runs and 
rii = 90 we get: Axa(IOO) = .030. By taking into account the data from one additional run, where 
detector 2 happened to click, we have Ax^(lOl) = .031. The differences may appear marginal, 
but nevertheless the accuracy of our estimate for xa has decreased although we incorporated 
additional data. So our original assumption does not hold. This is worrisome as it implies that 
a prediction based on a measurement of xa may be more accurate if the data of the last run are 
not included. Let us contrast this to theory B, which connects physical quantity and probability 
by Xb — Pi 6 - With TV and ni as before we have Axs(lOO) = .106. Incorporation of the data from 
the additional run leads to Axs(lOl) = .104. Now we obviously don't question the value of the 
last run, as the accuracy of our estimate has increased. 



The lesson to be learnt from the two examples is that the specific functional dependence of 
a physical quantity on the probability (or several probabilities if it is derived from a variety of 
experiments) determines whether our knowledge about the physical quantity will increase with 
additional experimental data, and that this also applies to the accuracy of our predictions. This 
raises the question what quantities we should be interested in to make sure that we get to know 
them more accurately by doing more experiments. From a statistical point of view the answer is 
straightforward: choose variables whose uncertainty interval strictly decreases, and simply define 
them as physical. And from a physical point of view? Coming from classical physics we may 
have a problem, as concepts like mass, distance, angular momentum, energy, etc. are suggested 
as candidates for physical quantities. But when coming from the phenomenology of quantum 
physics, where all we ever get from nature is random clicks and count rates, a definition of physical 
quantities according to statistical criteria may seem more reasonable, simply because there is no 
other guideline as to which random variables should be considered physical. 

Pursuing this line of thought we want to express experimental results by random variables 
whose uncertainty interval strictly decreases with more data. When using them in predictions, 
which are also expressed by variables with this property, predictions should automatically become 
more accurate with more data input. Now a few trials will show that there are many functions 
x{pi) whose uncertainty interval decreases with increasing N (eq.(3)). We want to choose the one 
with maximum predictive power. The meaning of this term becomes clear when realizing that in 
general Ax depends on N and on m (via pi). These two numbers have a very different status. 
The number of runs, N, is controlled by the experimenter, while the number of clicks, m, is solely 
due to nature. Maximum predictive power then means to eliminate nature's influence on A%. For 
then we can know Ax even before having done any experimental runs, simply upon deciding how 
many we will do. From eq.(3) we thus get 

dx 
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which results in 
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Pi(l — pi) = constant, (5) 



X = C arcsin(2p 1 - 1) + D (6) 

where C and D are real constants. The inverse is 

Pi = 2 , (7) 

showing that the probability is periodic in x- Aside from the linear transformations provided by 
C and D any other smooth function a(x) in real or complex spaces will also fulfill requirement 
(5) when equally sized intervals in x correspond to equal line lengths along the curve a(x)- One 
particular curve is 

a( X ) = sin(|)A (8) 

which is a circle in the complex plane with center at i/2. It exhibits the property \a\ = pi known 
from quantum theory. But note, that for instance the function (3 = sin(%/2) does not fulfill the 
requirement that the accuracy only depend on N. Therefore the complex phase factor in eq.(8) 
is necessary M M . 



3 Distinguishability 

We have now found a unique transformation from a probability to another class of variables 
exemplified by x m eq.(6). These unique variables always become better known with additional 
data. But can they be considered physical? We should first clarify what a physical variable is. A 
physical variable can assume different numerical values, where each value should not only imply 
a different physical situation, but should most of all lead to a different measurement result in a 
properly designed experiment. Within the probabilistic paradigm two measurement results are 
different when their uncertainty intervals don't overlap. This can be used to define a variable which 
counts the principally distinguishable results of the measurement of a probability. Comparison 
of that variable to our quantity \ should tell us how much \ must change from a given value 
before this can be noticed in an experiment. Following Wootters and Wheeler |||| the variable 
9 counting the statistically distinguishable results at detector 1 in N runs of our above example 
is given by 

(9) 
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where Ap is defined as in eq.(2). When dividing 6 by v N it becomes identical to x when in 
eq.(6) we set C = 1 and D = |. This illuminates the meaning of x- It is a continuous variable 
associated with a probability, with the particular property that anywhere in its domain an interval 
of fixed width corresponds to an equal number of measurement results distinguishable in a given 
number of runs. With Occam's dictum of not introducing more entities than are necessary for 
the description of the subject matter under investigation, x would be the choice for representing 
physical situations and can rightly be called physical. 

4 A Simple Prediction: The Superposition Principle 

Now we return to our aim of finding a strategy for maximum predictive power. We want to see 
whether the unique class of variables represented by x indicates a way beyond representing data 
and perhaps affords special predictions. For the sake of concreteness we think of the double slit 
experiment. A particle can reach the detector by two different routes. We measure the probabilty 
that it hits the detector via the left route, pi, by blocking the right slit. In L runs we get i%l 
counts. In the measurement of the probability with only the right path available, pr, we get riR 
counts in R runs. From these data we want to make a prediction about the probability ptot, when 
both paths are open. Therefore we make the hypotheses that p to t is a function of pr and pi- What 
can we say about the function Ptot{pL,Pn) when we demand maximum predictive power from it? 
This question is answered by reformulating the problem in terms of the associated variables xl, 
Xr and Xtot, which we derive according to eq.(6) by setting C = 1 and D = ~. The function 
Xtot(XL,XR) must be such that a prediction for xtot has an uncertainty interval Sxtot, which only 
depends on the number of runs, L and R, and decreases with both of them. (We use the symbol 
(>Xtot to indicate that it is not derived from a measurement of ptot, but from other measurements 
from which we want to predict ptot-) In this way we can predict the accuracy of Xtot by only 
deciding the number of runs, L and R. No actual measurements need to have been done. Because 
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maximum predictive power is achieved when 
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constant, 



j = L,R. 



We want to have a real function Xtot{.XL, Xr)i an d therefore we get 

Xtot = aXL + bxn + c, 



(10) 
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where a, b and c are real constants. Furthermore we must have c = and the magnitude of both 
a and b equal to 1, when we wish to have Xtot equivalent to Xr or to Xl when either the one or 
the other path is blocked. So there is an ambiguity of sign with a and b. When rewriting this in 
terms of the probability we get 

2/ Xl±Xr, 
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This does not look like the sum rule of probability theory. Only for Pl + Pr = ^ does it coincide 
with it. We may therefore conclude that the sum rule of probability theory does not afford maxi- 
mum predictive power. But neither does eq.(13) look like the quantum mechanical superposition 
principle. However, this should not be surprising because our input were just two real valued 
numbers, xl and xr-i from which we demanded to derive another real valued number. A general 
phase as is provided in quantum theory could thus not be incorporated. But let us see what we 
get with complex representatives of the associated variables of probabilities. We take a(x) from 
eq.(8). Again we define in an equivalent manner ai, a R and atot- From pi we have for instance 
(from (8) and (7) with C = 1 and D = § ) 
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If we postulate a relationship ettot{oiR-,otL) according to maximum predictive power we expect 
the predicted uncertainty interval Sa to t to be independent of ol and a^ and to decrease with 
increasing number of runs, L and R. Analogous to (11) we must have 
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3 = L,R, 



(16) 



yielding 

a tot = sa L + ta R + u, (17) 

where s, t, and u are complex constants. Now u must vanish and s and t must both be unimodular 
when p tot is to be equivalent to either p L or p R when the one or the other route is blocked. We 
then obtain 
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where is an arbitrary phase factor containing the phases of s and t. This is exactly the quantum 
mechanical superposition principle. What is striking is that with a theory of maximum predictive 
power we can obtain the general form of this principle, but cannot at all predict p tot even when 
we have measured pi and pn, because of the unknown phase <fi. So we are lead to postulate as 
a new measurable quantity in this experiment. 

5 Conclusion 

We have tried to obtain insight into the quantum mechanical superposition principle and set 
out with the idea that it might follow from a most natural assumption of experimental science: 
more data should provide a more accurate representation of the matter under investigation and 
afford more accurate predictions. From this we defined the concept of maximum predictive power 
which demands laws to be such that the uncertainty of a prediction is solely dependent on the 
number of experiments on which the prediction is based, and not on the specific outcomes of these 
experiments. Applying this to the observation of two probabilities and to possible predictions 
about a third probability therefrom, we arrived at the quantum mechanical superposition principle. 
Our result suggests nature's law to be such that from more observations more accurate predictions 
must be derivable. 
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